REPORT  DOCUMENTATION  PAGE 


Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including 
gathering  and  maintaining  the  data  needed,,  and  completing  and  reviewing  the  collection  of  information.  Send  co 
collection  of  information,  including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services, 
Davis  highway,  Suite  1204,  Arlington,  VA  22202-4302,  and  to  the  Office  of  Management  and  Budget,  Paperwork  Reduction  Project 


AFOSR-TR-97 


1,  AGENCY  USE  ONLY  {Leave 
Blank) 


2.  REPORT  DATE 
31  October  1997 


3.  REPORT  TYPE  A 

Final  1  September  1995  -  31  August  1997 


4.  TITLE  AND  SUBTITLE 
CNN  Universal  Chip 
Final  Report 


5.  FUNDING  NUMBERS 
F49620-93-1-0519 


6.  AUTHORS 

J.  Bokor  and  Leon  Chua 


PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 
Electronics  Research  Laboratory 
558  Cory  Hall 

University  of  California  at  Berkeley 
Berkeley,  CA  94720 


SPONSORING  /  MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 
AFOSR/NI 

ATTN:  AASERT  PROGRAM 
110  Duncan  Ave,  Room  BI 15 
Bolling,  AFB  DC  20332-8050 


PERFORMING  ORGANIZATION  REPORT 

NUMBER 

UCB/ERL-97/1 


10.  SPONSORING  /  MONITORING  AGENCY 
REPORT  NUMBER 


12a.  DISTRIBUTION  /  AVAILABILITY  STATEMENT 
APPROVED  FOR  PUBLIC  RELEASE. 
DISTRIBUTION  UNLIMITED. 


12b.  DISTRIBUTION  CODE 


ABSTRACT  (Maximum  200  words) 

The  goal  of  designing  and  building  a  high-performance  CNN  Universal  Machine  intergrated  circuit  with  demonstrable  image  processing  capabilities  was 
acheived  during  the  contract  funriing  period.  The  design  of  the  CNN  Universal  Chip  architecture  was  conducted  under  the  guidance  of  theoretical  results 
explicating  the  robust  elemental  processing  capabilities  of  such  cellular  analog  hardwares,  which  were  developed  under  this  contract.  Furthermore,  the 
anologic  algorithms  needed  for  implementing  the  specific  image  processing  techniques  required  in  several  target  application  areas  were  designed. 


19971106  166 


14.  SUBJECT  TERMS 
None 


15.  NUMBER  OF  PAGES 
5 


16.  PRICE  CODE 


17.  SECURITY  CLASSIFICATION 
OF  REPORT 
UNCLASSIFIED 


NSN  7540-01-280-5500 


18.  SECURITY  CLASSIFICATION 
OF  THIS  PAGE 
UNCLASSIFIED 


19.  SECURITY  CLASSIFICATION 
OF  ABSTRACT 
UNCLASSIFIED 


20.  LIMITATION  OF  ABSTRACT 


Standard  Form  298  (Rev.  2-89) 
Prescribed  by  ANSI  Std.  Z39-1 
298-102 


qmm:  iHepicTSD  e 


AASERT 


FINAL  TECHNICAL  REPORT 

CONTRACT  F49620-93-1-0519 

AASERT  PROGRAM 


Principal  Investigator:  Jeffrey  Bokor 


Electronics  Research  Laboratory 
University  Of  California,  Berkeley 


October  1997 


Final  Technical  Report  F49 62 0-9 3- 1-0519 


Abstract 

The  goal  of  designing  and  building  a  high-performance  CNN  Universal  Machine  integrated  circuit 
with  demonstrable  image  processing  capabilities  was  acheived  during  the  contract  funding  period. 
The  design  of  the  CNN  Universal  Chip  architecture  was  conducted  under  the  guidance  of  theoretical 
results  explicating  the  robust  elemental  processing  capabilities  of  such  cellular  analog  hardwares, 
which  were  developed  under  this  contract.  Furthermore,  the  analogic  algorithms  needed  for  imple¬ 
menting  the  specific  image  processing  techniques  required  in  several  target  application  areas  were 
designed. 

Technical  Summary 
CNN  Universal  Chip 

The  designed  and  fabricated  chip  is  the  first  continuous- time  universal  CNN  chip  with  both  the 
capability  to  input  and  output  gray  scale  images  and  with  the  capabihty  to  locally  store  and  recall 
images.  So  far  this  chip  is  also  the  CNN  universal  chip  with  both:  the  largest  number  of  parallel 
multipliers  (4,608  multipliers)  and  the  fastest  CNN  dynamics  (60  ns).  The  entire  circuit  is  fully 
custom  except  for  some  of  the  digital  gates. 

The  chip  is  composed  of  a  main  array  of  16  x  16  CNN  Universal  cells.  Each  cell  contains  local 
analog  and  logic  memory,  and  is  locally  interconnected  to  its  eight  neighboring  cells.  The  chip 
allows  analog  input  of  images  and  both  logic  and  analog  output.  The  following  list  shows  some  of 
the  key  features  of  the  chip. 

•  Continuous-time  analog  dynamics 

•  Fiill  ZxZ  intercell  connectivity 

•  Nineteen  (19)  independent  analog  template  elements 

•  Parallel  evaluation  of  the  feedforward  and  feedback  connections 

•  Two  (2)  local  analog  memories 

•  Two  (2)  local  logic  memories 

•  One  2-input  1-output  local  logic  gate  with  4-bit  control 

•  State  voltage  is  not  clipped 

•  Analog  output  buffer  inside  all  cells.  Internal  analog  dynamics  are  not  slowed  down  or  affected 
by  the  chip  external  loads. 

•  Row-by-row  input  of  analog  images 

•  Row-by-row  ouput  of  logic  images 

•  Serial  output  of  analog  images 

•  Capability  of  stopping  the  cell  dynamics 

•  Capability  of  sensing  out  any  selected  cell  during  the  analog  transient  processing 


•  Capability  of  mternally  presetting  all  cells  to  black  or  white 

•  Capabibty  of  loading  the  cells  initial  state  or  input  with  any  boolean  combination  of  the  data 
stored  in  the  two  local  logic  memories 

It  was  designed  to  the  following  specifications  (summary): 

•  Parallel  loading  time  of  analog  images  is  8ns  per  column 

•  Parallel  readout  time  of  logic  images  is  10ns  per  column 

•  Serial  readout  time  of  analog  values  is  60ns  per  cell 

•  Analog  processing  time  constant  is  56ns/27ns 

•  Local  analog  memories  holding  time  is  1ms  (2%  accuracy) 

•  Local  logic  memories  do  not  need  refreshing 

•  Accuracy  of  multipliers  is  1% 

•  Power  voltages  are  +2.5V  and  -2.5V 

•  Power  dissipation  is  0.5mW/cell 

•  Number  of  transistors  per  cell  is  203 

•  Area  per  cell  is  232//m  X  263.5;Lim  using  1.0  micron  line  width  in  an  HP  0.8um  technology 

•  Equivalent  digital  operations  per  second  per  cell  for  a  typical  CNN  operation  with  non-zero 
feedforward  plus  non-zero  non-propagating  feedback  template  is  807  MegaOPS/cell. 

•  Equivalent  digital  operations  per  second  per  square  centimeter  of  silicon  for  a  typical  CNN 
operation  with  non-zero  feedforward  plus  non-zero  non-propagating  feedback  template  is  1.3 
TeraOPS /sqrcm 

The  designed  chip  was  fabricated  in  small  quantity.  Using  the  chip  specified  chip  protocols, 
a  custom  board  was  built  at  the  Hungarian  Academy  of  Sciences  to  interface  the  CNN  chip  with 
the  CNN  chip  prototyping  system  (CCPS).  The  system  allows  the  execution  of  CNN  algorithms 
on  CNN  chips.  Using  this  system,  several  image  processing  tasks  were  verified  and  thereby  the 
functionality  of  the  fabricated  chips  was  validated.  Furthermore,  testing  and  characterization  of 
the  chips  is  underway,  and  results  are  being  used  to  inform  the  next-generation  designs. 

The  performance  of  systems  based  on  the  CNN  universal  chip  can  be  further  increased  by 
using  Analog- RAM  chips  designed  specifically  to  communicate  at  high  speed  with  the  main  CNN 
universal  chip.  Work  on  the  ba^ic  design  of  such  ARAMs  was  begun  during  this  funding  period. 

Theoretical  Results:  Processing  Capabilities 

We  have  shown  that  the  simplest  integrated  circuit  implementations  of  the  CNN  Universal  Machine 
can  play  the  ^game  of  life’,  and  are  therefore  equivalent  to  a  Turing  Machine.  In  addition,  a 
constructive  proof  is  given  for  the  direct  implementation  of  general  first-order  cellular  automata  on 
such  machines. 

It  is  an  important  question  to  determine  whether  the  simplest  CNN  Universal  Machine  imple¬ 
mentation  retains  universality  in  the  sense  of  Turing,  since  complicated,  non-standard  cells  were 


not  implemented  in  the  iirst  CNN  Universal  Chip  designs.  We  obtained  results  which  show  that 
CNN  Universal  Machines  with  only  single  layer  continuous  time  CNN  dynamics,  linear  templates 
local  logic  memory,  and  a  local  logic  unit,  can  implement  the  game  of  life  and  are  therefore,  in  fact, 
universal.  In  addition,  we  discovered  a  general  constructive  proof  that  the  same  architecture  can 
directly  implement  arbitrary  first-order  cellular  automata,  where  past  approaches  have  used  either 
discrete  time  operation  and  complex  nonlinearities,  multiple  layers  or  time  varying  templates. 

Furthermore,  our  research  revealed  that  many  essential  image  processing  and  pattern  formation 
effects  can  be  implemented  on  a  very  simple  CNNUM  architecture.  By  examining  the  dynamics 
in  the  frequency  domain,  when  all  CNN  cells  are  in  the  linear  region,  the  mechanisms  for  infinite 
impulse  response  (HR)  spatial  filtering,  pattern  formation,  morphogenesis,  and  synergetics  can 
be  shown  to  be  present,  even  though  each  cell  has  only  first  order  dynamics.  In  addition,  the 
method  allows  many  of  the  standard  CNN  templates,  such  as  the  nonlinear  ‘averaging’,  ‘halftoning’, 
and  ‘diffusion^  templates  to  be  explained  in  a  new  hght.  Through  many  examples,  it  has  been 
shown  how  generalizations  of  these  templates  can  be  used  to  design  linear  and  nonlinear  filters 
for  image  processing  tasks  such  as  low-pass  filtering,  time- varying  spatial  filtering,  and  fingerprint 
enhancement. 

Since  the  demonstrated  mechanisms  form  a  basis  for  many  practical  image  processing  algo¬ 
rithms,  the  CNN  Universal  Chip  must,  at  least,  have  the  capabilities  required  to  efficiently  im¬ 
plement  them.  Specifically,  we  have  learned  that  a  local  analog  addition  operation  is  required  in 
the  discrete-time  feedback  at  each  cell,  a  feature  not  included  in  previous  designs.  Also,  the  next 
generation  CNN  Universal  Chip  will  contain  a  concave  nonlinearity,  such  as  the  absolute  value 
function,  as  a  result  of  this  research.  On  the  other  hand,  it  was  also  learned  that  many  advanced 
features  such  as  multiple  layers  and  large  template  sizes  are  not  necessary,  an  observation  which 
win  free  chip  surface  area  for  more  CNN  cells. 

Robust  Analogic  Algorithms 

Algorithms  which  use  both  analog  processing  and  binary  processing  have  been  found  to  provide 
extremely  powerful  solutions  to  problems  in  image  deblurring  and  image  compression  and  have 
spurred  the  development  of  new  technologies  for  building  CNN  Universal  Machine  chips  which  are 
capable  of  implementing  these  algorithms. 

For  example,  a  CNN  Universal  Machine  chip  could  offer  a  solution  to  the  real-time  image 
deblurring  problem  of  microscopy.  Ideally,  a  CNN  chip  with  optical  inputs  would  be  placed  at  the 
image  plane  of  the  microscope.  The  CNN  would  contain  a  “program”  to  perform  the  deconvolution 
along  with  other  image  processing  steps  such  as  object  counting  or  edge  detection,  all  in  the  order 
of  a  microsecond.  Analogic  algorithms  for  the  following  image  processing  tasks  were  designed  to 
solve  problems  in  microscopy:  deconvolution,  illumination  estimation  and  contrast  enhancement, 
noise  filtering,  and  multi-scale  decompostions.  This  research  has  promoted  the  development  of  the 
CNN  Universal  machine  to  include  analog  feedback  and  local  storage. 

Neighborhood  logic  functions  and  cellular  automata  are  known  to  be  useful  in  modeling  physical 
systems,  morphological  image  processing,  and  random  number  generation.  High  speed  CNN-based 
binary  morphological  image  processing  has  practical  implications  for  improving  low  signal-to-noise 
imaging  applications,  for  instance  IR  night  vision  for  mihtary  or  security  applications.  Using  a 
CNN-based  processor,  such  images  could  be  enhanced  in  real-time  for  target  recognition  or  human 
interpretation.  Binary  morphology  is  also  used  in  next- generation  video  compression  algorithms. 
Reliable,  high-speed,  reproducible  random  number  generation  on  the  CNN  could  be  useful  in  many 
military  and  civilian  contexts  including  spread-spectrum  communications,  video  encryption,  and 


real-time  simulation  of  physical  systems. 

To  implement  morphological  image  processing  and  cellular  automata  on  the  CNN  Universal 
Machine,  there  is  a  need  to  perform  arbitrary  logic  functions  of  an  input  neighborhood.  Since 
the  CNN  architecture  computes  weighted  sums  of  this  neighborhood,  by  using  a  'B-template\  it  is 
limited  to  threshold  logic;  i.e.  a  logical  operation  to  be  computed  by  a  single  transient  must  be  in  the 
class  of  linearly  separable  boolean  functions.  It  was  shown  previously  how  a  general  neighborhood 
logic  function  can  be  implemented  on  the  CNNUM  by  cascading  component  functions  from  this 
class — namely  by  the  direct  implementation  of  the  ininterm  or  inaxterm  formulation  of  the  desired 
function.  However,  for  functions  of  a  3  x  3  input  neighborhood  this  method  may  require  up  to  256 
stages. 

Under  this  grant,  we  proposed  a  more  efficient  method  for  implementing  general  logic  functions 
on  the  CNNUM  and  other  hardwares  capable  of  performing  a  threshold  logic  function  of  an  input 
neighborhood.  The  class  of  considered  component  functions  is  a  superset  of  the  minterms  and 
maxterms  but,  for  purposes  of  searchability,  ease  of  implementation,  and  robustness,  a  subset  of 
the  general  linearly  separable  boolean  functions.  We  have  formulated  an  algorithm  that  will  find  a 
sequence  of  weighUrestricted  thresliLold  logic  functions  (B-templates  with  weights  from  {  — 1,0,+1} 
and  a  bias)  that,  when  ca>scaded  together  using  two- input  logical  operations,  will  result  in  the 
desired  boolean  function.  In  addition,  a  corresponding  computer  program  has  been  written  to 
design  CNN  Universal  Machine  algorithms  (template  sequences)  for  the  implementation  of  arbitrary 
neighborhood  boolean  logic  functions  using  this  algorithm. 

An  important  concern  which  motivated  the  restricted-weight  paradigm  is  that  the  analog  mul¬ 
tipliers  may  not  be  accurate  enough  to  implement  general  threshold  logic  in  a  robust  manner.  We 
have  begun  testing  of  our  boolean  neighborhood  functions  on  the  new  chip  and  are  expecting  to 
find  them  to  perform  more  robustly  than  the  general  threshold  logic  functions  which  do  not  use 
restricted  weights. 
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